Search CORE

10 research outputs found

Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes

Author: Fisher Matthew
Li Manyi
Patil Akshay Gadi
Patil Supriya Gadi
Savva Manolis
Zhang Hao
Publication venue
Publication date: 21/08/2023
Field of study

This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.Comment: Published in Computer Graphics Forum, Aug 202

arXiv.org e-Print Archive

Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images

Author: Patil Akshay Gadi
Wang Ruiqi
Yu Fenggen
Zhang Hao
Publication venue
Publication date: 27/11/2023
Field of study

We introduce the first active learning (AL) framework for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. As with most human-in-the-loop approaches, the key criterion for success in AL is to minimize human effort while still attaining high performance. To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the network tailored to moveable parts, we introduce a coarse-to-fine AL approach which first uses an object-aware masked attention and then a pose-aware one, leveraging the hierarchical nature of the problem and a correlation between moveable parts and object poses and interaction directions. Our method achieves close to fully accurate (96% and higher) segmentation results, with semantic labels, on real images, with 82% time saving over manual effort, where the training data consists of only 11.45% annotated real photographs. At last, we contribute a dataset of 2,550 real photographs with annotated moveable parts, demonstrating its superior quality and diversity over the current best alternatives

arXiv.org e-Print Archive

RoSI: Recovering 3D Shape Interiors from Few Articulation Images

Author: Bennett Eric
Jackson Brian
Patil Akshay Gadi
Qian Yiming
Yang Shan
Zhang Hao
Publication venue
Publication date: 13/04/2023
Field of study

The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images. Our neural architecture is trained in a category-agnostic manner and it consists of a motion-aware multi-view analysis phase including pose, depth, and motion estimations, followed by interior plane detection in images and 3D space, and finally multi-view plane fusion. In addition, our method also predicts part articulations and is able to realize and even extrapolate the captured motions on the target 3D object. We evaluate our method by quantitative and qualitative comparisons to baselines and alternative solutions, as well as testing on untrained object categories and real image inputs to assess its generalization capabilities

arXiv.org e-Print Archive

Signal processing approaches for multimedia application

Author: Patil Akshay Gadi
Publication venue: Indian Institute of Technology Gandhinagar
Publication date: 01/01/2016
Field of study

by Akshay Gadi PatilM.Tech

IIT Gandhinagar

Tone mapping HDR images using local texture and brightness measures

Author: Patil Akshay Gadi
Raman Shanmuganathan
Publication venue: CVIP
Publication date: 01/02/2016
Field of study

by Akshay Gadi Patil and Shanmuganathan Rama

IIT Gandhinagar

Automatic content-aware non-photorealistic rendering of images

Author: Patil Akshay Gadi
Raman Shanmuganathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Non-photorealistic rendering techniques work on image features and often manipulate a set of characteristics such as edges and texture to achieve a desired depiction of the scene. Most computational photography methods decompose an image using edge preserving filters and work on the resulting base and detail layers independently to achieve desired visual effects. We propose a new approach for content-aware non-photorealistic rendering of images where we manipulate the visually salient and non-salient regions separately. We propose a novel content-aware framework in order to render an image for applications such as detail exaggeration, artificial smoothing, and image abstraction. The processed regions of the image are blended seamlessly with the rest of the image for all these applications. We demonstrate that content awareness of the proposed method leads to automatic generation of non-photorealistic rendering of the same image for the different applications mentioned above.by Akshay Gadi Patil and Shanmuganathan Rama

arXiv.org e-Print Archive

IIT Gandhinagar